359 research outputs found

    Improved Adaptive Rejection Metropolis Sampling Algorithms

    Full text link
    Markov Chain Monte Carlo (MCMC) methods, such as the Metropolis-Hastings (MH) algorithm, are widely used for Bayesian inference. One of the most important issues for any MCMC method is the convergence of the Markov chain, which depends crucially on a suitable choice of the proposal density. Adaptive Rejection Metropolis Sampling (ARMS) is a well-known MH scheme that generates samples from one-dimensional target densities making use of adaptive piecewise proposals constructed using support points taken from rejected samples. In this work we pinpoint a crucial drawback in the adaptive procedure in ARMS: support points might never be added inside regions where the proposal is below the target. When this happens in many regions it leads to a poor performance of ARMS, with the proposal never converging to the target. In order to overcome this limitation we propose two improved adaptive schemes for constructing the proposal. The first one is a direct modification of the ARMS procedure that incorporates support points inside regions where the proposal is below the target, while satisfying the diminishing adaptation property, one of the required conditions to assure the convergence of the Markov chain. The second one is an adaptive independent MH algorithm with the ability to learn from all previous samples except for the current state of the chain, thus also guaranteeing the convergence to the invariant density. These two new schemes improve the adaptive strategy of ARMS, thus simplifying the complexity in the construction of the proposals. Numerical results show that the new techniques provide better performance w.r.t. the standard ARMS.Comment: Matlab code provided in http://a2rms.sourceforge.net

    Multi-label classification using ensembles of pruned sets

    Get PDF
    This paper presents a Pruned Sets method (PS) for multi-label classification. It is centred on the concept of treating sets of labels as single labels. This allows the classification process to inherently take into account correlations between labels. By pruning these sets, PS focuses only on the most important correlations, which reduces complexity and improves accuracy. By combining pruned sets in an ensemble scheme (EPS), new label sets can be formed to adapt to irregular or complex data. The results from experimental evaluation on a variety of multi-label datasets show that [E]PS can achieve better performance and train much faster than other multi-label methods

    Scikit-Multiflow: A Multi-output Streaming Framework

    Full text link
    Scikit-multiflow is a multi-output/multi-label and stream data mining framework for the Python programming language. Conceived to serve as a platform to encourage democratization of stream learning research, it provides multiple state of the art methods for stream learning, stream generators and evaluators. scikit-multiflow builds upon popular open source frameworks including scikit-learn, MOA and MEKA. Development follows the FOSS principles and quality is enforced by complying with PEP8 guidelines and using continuous integration and automatic testing. The source code is publicly available at https://github.com/scikit-multiflow/scikit-multiflow.Comment: 5 pages, Open Source Softwar

    Efficient multi-label classification for evolving data streams

    Get PDF
    Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as the learners must be able to adapt to change using limited time and memory. This paper proposes a new experimental framework for studying multi-label evolving stream classification, and new efficient methods that combine the best practices in streaming scenarios with the best practices in multi-label classification. We present a Multi-label Hoeffding Tree with multilabel classifiers at the leaves as a base classifier. We obtain fast and accurate methods, that are well suited for this challenging multi-label classification streaming task. Using the new experimental framework, we test our methodology by performing an evaluation study on synthetic and real-world datasets. In comparison to well-known batch multi-label methods, we obtain encouraging results

    Evaluation methods and decision theory for classification of streaming data with temporal dependence

    Get PDF
    Predictive modeling on data streams plays an important role in modern data analysis, where data arrives continuously and needs to be mined in real time. In the stream setting the data distribution is often evolving over time, and models that update themselves during operation are becoming the state-of-the-art. This paper formalizes a learning and evaluation scheme of such predictive models. We theoretically analyze evaluation of classifiers on streaming data with temporal dependence. Our findings suggest that the commonly accepted data stream classification measures, such as classification accuracy and Kappa statistic, fail to diagnose cases of poor performance when temporal dependence is present, therefore they should not be used as sole performance indicators. Moreover, classification accuracy can be misleading if used as a proxy for evaluating change detectors with datasets that have temporal dependence. We formulate the decision theory for streaming data classification with temporal dependence and develop a new evaluation methodology for data stream classification that takes temporal dependence into account. We propose a combined measure for classification performance, that takes into account temporal dependence, and we recommend using it as the main performance measure in classification of streaming data

    Uncovering the Spatial and Temporal Variability of Wind Resources in Europe: A Web-Based Data-Mining Tool

    Full text link
    We introduce REmap-eu.app, a web-based data-mining visualization tool of the spatial and temporal variability of wind resources. It uses the latest open-access dataset of the daily wind capacity factor in 28 European countries between 1979 and 2019 and proposes several user-configurable visualizations of the temporal and spatial variations of the wind power capacity factor. The platform allows for a deep analysis of the distribution, the crosscountry correlation, and the drivers of low wind power events. It offers an easy-to-use interface that makes it suitable for the needs of researchers and stakeholders. The tool is expected to be useful in identifying areas of high wind potential and possible challenges that may impact the large-scale deployment of wind turbines in Europe. Particular importance is given to the visualization of low wind power events and to the potential of cross-border cooperations in mitigating the variability of wind in the context of increasing reliance on weather-sensitive renewable energy sources.Comment: visit Remap-eu.ap
    corecore